Using Crowdsourcing to Improve Profanity Detection

نویسندگان

Sara Owsley Sood

Judd Antin

Elizabeth F. Churchill

چکیده

Profanity detection is often thought to be an easy task. However, past work has shown that current, list-based systems are performing poorly. They fail to adapt to evolving profane slang, identify profane terms that have been disguised or only partially censored (e.g., @ss, f$#%) or intentionally or unintentionally misspelled (e.g., biatch, shiiiit). For these reasons, they are easy to circumvent and have very poor recall. Secondly, they are a one-size fits all solution – making assumptions that the definition, use and perceptions of profane or inappropriate holds across all contexts. In this article, we present work that attempts to move beyond list-based profanity detection systems by identifying the context in which profanity occurs. The proposed system uses a set of comments from a social news site labeled by Amazon Mechanical Turk workers for the presence of profanity. This system far surpasses the performance of listbased profanity detection techniques. The use of crowdsourcing in this task suggests an opportunity to build profanity detection systems tailored to sites and communities.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Rephrasing Profanity in Chinese Text

This paper proposes a system that can detect and rephrase profanity in Chinese text. Rather than just masking detected profanity, we want to revise the input sentence by using inoffensive words while keeping their original meanings. 29 of such rephrasing rules were invented after observing sentences on real-word social websites. The overall accuracy of the proposed system is 85.56%

متن کامل

Good Clean Fun? A Content Analysis of Profanity in Video Games and Its Prevalence across Game Systems and Ratings

Although violent video game content and its effects have been examined extensively by empirical research, verbal aggression in the form of profanity has received less attention. Building on preliminary findings from previous studies, an extensive content analysis of profanity in video games was conducted using a sample of the 150 top-selling video games across all popular game platforms (includ...

متن کامل

Profanity in media associated with attitudes and behavior regarding profanity use and aggression.

OBJECTIVE We hypothesized that exposure to profanity in media would be directly related to beliefs and behavior regarding profanity and indirectly to aggressive behavior. METHODS We examined these associations among 223 adolescents attending a large Midwestern middle school. Participants completed a number of questionnaires examining their exposure to media, attitudes and behavior regarding p...

متن کامل

Perform Three Data Mining Tasks with Crowdsourcing Process

For data mining studies, because of the complexity of doing feature selection process in tasks by hand, we need to send some of labeling to the workers with crowdsourcing activities. The process of outsourcing data mining tasks to users is often handled by software systems without enough knowledge of the age or geography of the users' residence. Uncertainty about the performance of virtual user...

متن کامل

Analyzing Labeled Cyberbullying Incidents on the Instagram Social Network

Cyberbullying is a growing problem affecting more than half of all American teens. The main goal of this paper is to study labeled cyberbullying incidents in the Instagram social network. In this work, we have collected a sample data set consisting of Instagram images and their associated comments. We then designed a labeling study and employed human contributors at the crowd-sourced CrowdFlowe...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2012

Using Crowdsourcing to Improve Profanity Detection

نویسندگان

چکیده

منابع مشابه

Rephrasing Profanity in Chinese Text

Good Clean Fun? A Content Analysis of Profanity in Video Games and Its Prevalence across Game Systems and Ratings

Profanity in media associated with attitudes and behavior regarding profanity use and aggression.

Perform Three Data Mining Tasks with Crowdsourcing Process

Analyzing Labeled Cyberbullying Incidents on the Instagram Social Network

عنوان ژورنال:

اشتراک گذاری